Sequencing and Raw Sequence Data Quality Control ◾ 35
Figure 1.28 shows the FASTX-toolkit tools in the “bin” directory after downloading and
extracting the compressed archive file.
FASTX-toolkit includes several tools for the processing of FASTQ files as described in
Table 1.4. You can display the usage and options of any of the executable programs by
entering the program name with “-h” option on the command-line prompt. For instance,
to display the help for “fastq_quality_filter”, simply enter the following on the command
line:
fastq_quality_filter -h
To show how FASTQ files are processed, we will download a raw FASTQ file from the NCBI
SRA database and modify its name for the practice. The following commands create the
TABLE 1.4 FASTX-Toolkit Programs and Descriptions
Command Name
Description
fastq_to_fasta
converts FASTQ files to FASTA files
fastx_quality_stats
charts Quality Statistics and Nucleotide Distribution
fastx_collapser
collapses identical sequences into a single sequence
fastx_uncollapser
expands collapsed identical sequences
fastx_trimmer
trims reads in a FASTQ files (removing barcodes or noise)
fastx_renamer
renames the sequence identifiers in FASTQ/A file
fastx_clipper
removes sequencing adapters/linkers
fasta_clipping_histogram.pl
creates a Linker Clipping Information Histogram
fastq_quality_boxplot_graph.sh
creates quality boxplot
fastx_nucleotide_distribution_graph.sh
creates nucleotide distribution graph
fastx_nucleotide_distribution_line_graph.sh
creates nucleotide distribution line graph
fastx_reverse_complement
produces the Reverse-complement of each sequence
fastx_barcode_splitter.pl
splits a FASTQ/FASTA files containing multiple samples
fasta_formatter
changes the width of sequences line in a FASTA file
fasta_nucleotide_changer
converts FASTA sequences from/to RNA/DNA
fastq_quality_filter
filters sequences based on quality
fastq_quality_trimmer
trims (cuts) sequences based on quality
fastx_artifacts_filter
FASTQ/A Artifacts Filter
fastq_masker
masks nucleotides with “N” based on quality
FIGURE 1.28 FASTX-toolkit programs.